pacman::p_load(sf, terra, spatstat,
tmap, rvest, tidyverse)Hands-on Exercise 3
1 Overview
Spatial Point Pattern Analysis (SPPA) - The pattern or distribution of a set of points on a surface.
First-Order SPPA (1st-SPPA) - Intensity/density of points across a study area.
Launching five R packages below:
spatstat: for point pattern analysisterra: modern spatial data analysis package, used to convert image output generate by spatstat into terra formatrvest: for scraping of data from web pages
2 Importing and Wrangling Datasets
Import the MasterPlan2019 data. st_zm() is used to remove Z (elevation) and M (measure) dimensions from geospatial geometries.
mpsz_sf <- st_read("data/geospatial/MasterPlan2019SubzoneBoundaryNoSeaKML.kml") %>%
st_zm(drop = TRUE, what = "ZM") %>% st_transform(crs = 3414)Reading layer `URA_MP19_SUBZONE_NO_SEA_PL' from data source
`C:\emmachew\ISSS626-emma-chew-en-chin\Hands-on Exercise\Hands-on_Ex03\data\geospatial\MasterPlan2019SubzoneBoundaryNoSeaKML.kml'
using driver `KML'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
Build a function to extract attributes REGION_N, PLN_AREA_N, SUBZONE_N, SUBZONE_C from Description field.
extract_kml_field <- function(html_text, field_name) {
if (is.na(html_text) || html_text == "") return(NA_character_)
page <- read_html(html_text)
rows <- page %>% html_elements("tr")
value <- rows %>%
keep(~ html_text2(html_element(.x, "th")) == field_name) %>%
html_element("td") %>%
html_text2()
if (length(value) == 0) NA_character_ else value
}mpsz_sf <- mpsz_sf %>%
mutate(
REGION_N = map_chr(Description, extract_kml_field, "REGION_N"),
PLN_AREA_N = map_chr(Description, extract_kml_field, "PLN_AREA_N"),
SUBZONE_N = map_chr(Description, extract_kml_field, "SUBZONE_N"),
SUBZONE_C = map_chr(Description, extract_kml_field, "SUBZONE_C")
) %>%
select(-Name, -Description) %>%
relocate(geometry, .after = last_col())mpsz_cl <- mpsz_sf %>%
filter(SUBZONE_N != "SOUTHERN GROUP",
PLN_AREA_N != "WESTERN ISLANDS",
PLN_AREA_N != "NORTH-EASTERN ISLANDS")write_rds(mpsz_cl,
"data/mpsz_cl.rds")Import the ChildCareService data.
childcare_sf <- st_read("data/geospatial/ChildCareServices.kml") %>%
st_zm(drop = TRUE, what = "ZM") %>%
st_transform(crs = 3414)Reading layer `CHILDCARE' from data source
`C:\emmachew\ISSS626-emma-chew-en-chin\Hands-on Exercise\Hands-on_Ex03\data\geospatial\ChildCareServices.kml'
using driver `KML'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6878 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Plotting a static map of childcare_sf.
plot(st_geometry(mpsz_sf), col = "grey")
plot(st_geometry(childcare_sf), add = TRUE, pch = 19, col = "black", cex = 0.6)
Plotting an interactive map of childcare_sf.
tmap_mode('view')
tm_shape(childcare_sf)+
tm_dots()tmap_mode('plot')3 Geospatial Data Wrangling
Convert sf (Simple Features) objects into spatstat ppp and owin objects.
3.1 Conversting sf dataframes to ppp class
Using spatstat to convert point event data in **ppp* object form.
Next, use class() to verify the object class.
childcare_ppp <- as.ppp(childcare_sf)class(childcare_ppp)[1] "ppp"
summary(childcare_ppp)Marked planar point pattern: 1925 points
Average intensity 2.417323e-06 points per square unit
Coordinates are given to 11 decimal places
Mark variables: Name, Description
Summary:
Name Description
Length:1925 Length:1925
Class :character Class :character
Mode :character Mode :character
Window: rectangle = [11810.03, 45404.24] x [25596.33, 49300.88] units
(33590 x 23700 units)
Window area = 796335000 square units
3.2 Creating owin object
owin object is used to represent a polygonal region (e.g. Singapore boundary)
Using as.owin() to convert mpsz_sf into owin object.
Next, use class() to verify the object class.
sg_owin <- as.owin(mpsz_cl)class(sg_owin)[1] "owin"
plot(sg_owin)
3.3 Combining point events object and owin object
Combine both point and polygon feature into one ppp object class.
childcareSG_ppp = childcare_ppp[sg_owin]childcareSG_pppMarked planar point pattern: 1925 points
Mark variables: Name, Description
window: polygonal boundary
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
4 Clark-Evan Test for Nearest Neighbour Analysis (NNA)
Nearest Neighbor Analysis (NNA) is a spatial statistics method that calculates the average distance between each point and its closest neighbor to determine if a pattern of points is clustered, dispersed, or randomly distributed.
Clark-Evans test of aggregation is performed using clarkevans.test() of the spatstat.explore package.
The test hypotheses are:
Ho = The distribution of childcare services are randomly distributed.
H1 = The distribution of childcare services are not randomly distributed.
The 95% confident interval will be used.
4.1. Performing the Clark-Evans test without CSR
clarkevans.test(childcareSG_ppp,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"))
Clark-Evans test
No edge correction
Z-test
data: childcareSG_ppp
R = 0.53532, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
4.2. Performing the Clark-Evans test with CSR
Argument method = “MonteCarlo” is used. The p-value for the test is computed by comparing the observed value of R to the results obtained from nsim (i.e. 39, 99, 999) simulated realisations of Complete Spatial Randomness (CSR) conditional on the observed number of points.
clarkevans.test(childcareSG_ppp,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
method="MonteCarlo",
nsim=99)
Clark-Evans test
No edge correction
Monte Carlo test based on 99 simulations of CSR with fixed n
data: childcareSG_ppp
R = 0.53532, p-value = 0.01
alternative hypothesis: clustered (R < 1)
From the results, we can conclude at a 95% statistical significance level that we can reject the null hypothesis that the distribution of childcare services are randomly distributed.
This shows that there is a pattern to which childcare services are distributed.
5 Kernel Density Estimation Method (KDE)
Kernel Density Estimation (KDE) visualises and analyses first-order spatial point patterns.
It transforms discrete point data into continuous density surfaces that reveal clusters and variations in event occurrences, without making prior assumptions about data distribution.
Use cases:
- Data distribution understanding
- Identification of hotspots
- Explore relationships between spatial variables
5.1 Working with automatic bandwidth selection method
We can compute a kernel density using different bandwidth selection methods:
bw.diggle(): Adaptive bandwidth suitable for clustered patternsbw.CvL(): Cross-validation based methodbw.scott(): Rule-of-thumb bandwidth (good for normal distributions)bw.ppl(): Pilot bandwidth selection
Each method can be chosen based on the data characteristics and level of smoothing to be achieved.
Below, we use bw.diggle(), and use the smoothing kernel “gaussian”.
kde_SG_diggle <- density(
childcareSG_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian") plot(kde_SG_diggle)
Note: The density values of the output range from 0 to 0.00003727443 which is way too small to comprehend. This is because the default unit of measurement of svy21 is in meter. As a result, the density values computed is in “number of points per square meter”.
summary(kde_SG_diggle)real-valued pixel image
128 x 128 pixel array (ny, nx)
enclosing rectangle: [2667.538, 55941.94] x [21448.47, 50256.33] units
dimensions of each pixel: 416 x 225.0614 units
Image is defined on a subset of the rectangular grid
Subset area = 669941961.12249 square units
Subset area fraction = 0.437
Pixel values (inside window):
range = [-6.584123e-21, 3.063698e-05]
integral = 1927.788
mean = 2.877545e-06
We can retrieve the bandwidth used to compute the kde layer as below.
bw <- bw.diggle(childcareSG_ppp)
bw sigma
295.9712
5.2 Rescaling KDE values
rescale.ppp() is used to convert the unit of measurement from meter to kilometer.
childcareSG_ppp_km <- rescale.ppp(
childcareSG_ppp, 1000, "km")Now, we re-run density() and plot the output kde map. Output image remains the same as the earlier output, but only the data values on the legend have changed.
kde_childcareSG_km <- density(childcareSG_ppp_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")plot(kde_childcareSG_km)
5.3 Working with different automatic bandwidth methods
Trying the other functions used to determine the bandwidth.
bw.CvL(childcareSG_ppp_km) sigma
4.357209
bw.scott(childcareSG_ppp_km) sigma.x sigma.y
2.159749 1.396455
bw.ppl(childcareSG_ppp_km) sigma
0.378997
bw.diggle(childcareSG_ppp_km) sigma
0.2959712
bw.ppl() can be used to produce more appropriate values when the pattern consists predominantly of tight clusters.
bw.diggle() can be used to study and detect a single tight cluster in the midst of random noise.
kde_childcareSG.ppl <- density(childcareSG_ppp_km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_childcareSG_km, main = "bw.diggle")
plot(kde_childcareSG.ppl, main = "bw.ppl")
5.4 Working with different kernel methods
By default, the kernel method used in density.ppp() is gaussian.
There are 3 other options:
- Epanechnikov
- Quartic
- Dics
par(mfrow=c(2,2), mar=c(1,2,2,2))
plot(density(childcareSG_ppp_km,
sigma=0.2959712,
edge=TRUE,
kernel="gaussian"),
main="Gaussian")
plot(density(childcareSG_ppp_km,
sigma=0.2959712,
edge=TRUE,
kernel="epanechnikov"),
main="Epanechnikov")
plot(density(childcareSG_ppp_km,
sigma=0.2959712,
edge=TRUE,
kernel="quartic"),
main="Quartic")
plot(density(childcareSG_ppp_km,
sigma=0.2959712,
edge=TRUE,
kernel="disc"),
main="Disc")
6 Fixed and Adaptive KDE
6.1 Computing KDE by using fixed bandwidth
A KDE layer is computed by defining a bandwidth of 600 meter.
The sigma value used is 0.6, because the unit of measurement of childcareSG_ppp_km object is in kilometer, hence the 600m is 0.6km.
kde_childcareSG_fb <- density(childcareSG_ppp_km,
sigma=0.6,
edge=TRUE,
kernel="gaussian")
plot(kde_childcareSG_fb)
6.2 Computing KDE by using adaptive bandwidth
Fixed bandwidth method is very sensitive to highly skew distribution of spatial point patterns over geographical units for example urban versus rural. One way to overcome this problem is by using adaptive bandwidth instead.
kde_childcareSG_ab <- adaptive.density(
childcareSG_ppp_km,
method="kernel")
plot(kde_childcareSG_ab)
Comparing the fixed and adaptive kernel density estimation output below.
par(mfrow=c(1,2), mar=c(2,2,2,1))
plot(kde_childcareSG_fb, main = "Fixed bandwidth")
plot(kde_childcareSG_ab, main = "Adaptive bandwidth")
7 Plotting cartographic quality KDE map
7.1 Converting gridded output into raster
Converting the kernel density objects into SpatRaster objects, using rast() of the terra package.
kde_childcareSG_bw_terra <- rast(kde_childcareSG_km)Using class() to verify if kde_childcareSG_bw_terra data belongs to the SpatRaster class.
class(kde_childcareSG_bw_terra)[1] "SpatRaster"
attr(,"package")
[1] "terra"
The crs property is empty.
kde_childcareSG_bw_terraclass : SpatRaster
size : 128, 128, 1 (nrow, ncol, nlyr)
resolution : 0.4162063, 0.2250614 (x, y)
extent : 2.667538, 55.94194, 21.44847, 50.25633 (xmin, xmax, ymin, ymax)
coord. ref. :
source(s) : memory
name : lyr.1
min value : -5.824417e-15
max value : 3.063698e+01
unit : km
7.2 Assigning projection systems
We have to assign the CRS information back on kde_childcareSG_bw_terra layer.
crs(kde_childcareSG_bw_terra) <- "EPSG:3414"How, the coordinates reference is in SVY21.
kde_childcareSG_bw_terraclass : SpatRaster
size : 128, 128, 1 (nrow, ncol, nlyr)
resolution : 0.4162063, 0.2250614 (x, y)
extent : 2.667538, 55.94194, 21.44847, 50.25633 (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414)
source(s) : memory
name : lyr.1
min value : -5.824417e-15
max value : 3.063698e+01
unit : km
7.3 Plotting KDE map with tmap
Displaying the raster in cartographic quality map using the tmap package.
Raster values are encoded explicitly onto the raster pixel using the values in “layer.1” field.
tm_shape(kde_childcareSG_bw_terra) +
tm_raster(col.scale =
tm_scale_continuous(
values = "viridis"),
col.legend = tm_legend(
title = "Density values",
title.size = 0.7,
text.size = 0.7,
bg.color = "white",
bg.alpha = 0.7,
position = tm_pos_in(
"right", "bottom"),
frame = TRUE)) +
tm_graticules(labels.size = 0.7) +
tm_compass() +
tm_layout(scale = 1.0)
8 First Order SPPA at the Planning Subzone Level
We will analyse at the planning area level.
8.1 Geospatial data wrangling
8.1.1 Extracting study area
Extract target planning areas.
pg <- mpsz_cl %>%
filter(PLN_AREA_N == "PUNGGOL")
tm <- mpsz_cl %>%
filter(PLN_AREA_N == "TAMPINES")
ck <- mpsz_cl %>%
filter(PLN_AREA_N == "CHOA CHU KANG")
jw <- mpsz_cl %>%
filter(PLN_AREA_N == "JURONG WEST")Review extracted areas.
par(mfrow=c(2,2), mar=c(2,2,2,1))
plot(st_geometry(pg), main = "Ponggol")
plot(st_geometry(tm), main = "Tampines")
plot(st_geometry(ck), main = "Choa Chu Kang")
plot(st_geometry(jw), main = "Jurong West")
8.1.2 Create owin object
Convert sf objects into owin.
pg_owin = as.owin(pg)
tm_owin = as.owin(tm)
ck_owin = as.owin(ck)
jw_owin = as.owin(jw)8.1.3 Combining point events object and owin object
childcare_pg_ppp = childcare_ppp[pg_owin]
childcare_tm_ppp = childcare_ppp[tm_owin]
childcare_ck_ppp = childcare_ppp[ck_owin]
childcare_jw_ppp = childcare_ppp[jw_owin]Use rescale.ppp() to transform the unit of measurement from metre to kilometre.
childcare_pg_ppp.km = rescale.ppp(childcare_pg_ppp, 1000, "km")
childcare_tm_ppp.km = rescale.ppp(childcare_tm_ppp, 1000, "km")
childcare_ck_ppp.km = rescale.ppp(childcare_ck_ppp, 1000, "km")
childcare_jw_ppp.km = rescale.ppp(childcare_jw_ppp, 1000, "km")Plotting the 4 study areas and the locations of the childcare centres.
par(mfrow=c(2,2), mar=c(2,2,2,1))
plot(unmark(childcare_pg_ppp.km),
main="Punggol")
plot(unmark(childcare_tm_ppp.km),
main="Tampines")
plot(unmark(childcare_ck_ppp.km),
main="Choa Chu Kang")
plot(unmark(childcare_jw_ppp.km),
main="Jurong West")
8.2 Conduct Clark and Evans Test
8.2.1 Choa Chu Kang planning area
clarkevans.test(childcare_ck_ppp,
correction="none",
clipregion=NULL,
alternative=c("two.sided"),
nsim=999)
Clark-Evans test
No edge correction
Z-test
data: childcare_ck_ppp
R = 0.84097, p-value = 0.008866
alternative hypothesis: two-sided
8.2.2 Tampines planning area
clarkevans.test(childcare_tm_ppp,
correction="none",
clipregion=NULL,
alternative=c("two.sided"),
nsim=999)
Clark-Evans test
No edge correction
Z-test
data: childcare_tm_ppp
R = 0.66817, p-value = 6.58e-12
alternative hypothesis: two-sided
8.3 Computing KDE surfaces by planning area
Computing the KDE of each planning area, using bw.diggle().
par(mfrow=c(2,2), mar=c(2,2,2,1))
plot(density(childcare_pg_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Punggol")
plot(density(childcare_tm_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Tempines")
plot(density(childcare_ck_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Choa Chu Kang")
plot(density(childcare_jw_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Jurong West")